Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data
نویسندگان
چکیده
Motivation Growth phenotype profiling of genome-wide gene-deletion strains over stress conditions can offer a clear picture that the essentiality of genes depends on environmental conditions. Systematically identifying groups of genes from such high-throughput data that share similar patterns of conditional essentiality and dispensability under various environmental conditions can elucidate how genetic interactions of the growth phenotype are regulated in response to the environment. Results We first demonstrate that detecting such 'co-fit' gene groups can be cast as a less well-studied problem in biclustering, i.e. constant-column biclustering. Despite significant advances in biclustering techniques, very few were designed for mining in growth phenotype data. Here, we propose Gracob, a novel, efficient graph-based method that casts and solves the constant-column biclustering problem as a maximal clique finding problem in a multipartite graph. We compared Gracob with a large collection of widely used biclustering methods that cover different types of algorithms designed to detect different types of biclusters. Gracob showed superior performance on finding co-fit genes over all the existing methods on both a variety of synthetic data sets with a wide range of settings, and three real growth phenotype datasets for E. coli, proteobacteria and yeast. Availability and Implementation Our program is freely available for download at http://sfb.kaust.edu.sa/Pages/Software.aspx. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
DNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach
Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNAmicroarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of bicluste...
متن کاملEfficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth: The FDCluster Approach
DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray datase...
متن کاملcHawk: An Efficient Biclustering Algorithm based on Bipartite Graph Crossing Minimization
Biclustering is a very useful data mining technique for gene expression analysis and profiling. It helps identify patterns where different genes are co-related based on a subset of conditions. Bipartite Spectral partitioning is a powerful technique to achieve biclustering but its computation complexity is prohibitive for applications dealing with large input data. We provide a connection betwee...
متن کاملBiCross : A Biclustering Technique for Gene Expression Data using One Layer Fixed Weighted Bipartite Graph Crossing Minimization
Biclustering has become an important data mining technique for microarray gene expression analysis and profiling, as it provides a local view of the hidden relationships in data, unlike a global view provided by conventional clustering techniques. This technique, in contrast to the conventional clustering techniques, helps in identifying a subset of the genes and a subset of the experimental co...
متن کاملA new GRASP metaheuristic for biclustering of gene expression data
The term biclustering stands for simultaneous clustering of both genes and conditions. This task has generated considerable interest over the past few decades, particularly related to the analysis of high-dimensional gene expression data in information retrieval, knowledge discovery, and data mining [1]. Since the problem has been shown to be NP-complete, we have recently designed and implement...
متن کامل